Skip to content

Conversation

@fm3
Copy link
Member

@fm3 fm3 commented Oct 15, 2025

Worker missing notifications now

  • report the duration since the last heartbeat
  • are sent again after 1 hour

Steps to test:

(I already tested locally)

  • run yarn enable-jobs but do not start a worker
  • Adapt timings in application.conf
  • Watch backend logging for worker missing warnings.

  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases

@fm3 fm3 self-assigned this Oct 15, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

📝 Walkthrough

Walkthrough

Replaces in-memory per-worker dead tracking with a TemporaryStore-backed persistent store, adds a configurable re-report interval, updates WorkerLivenessService constructor and message formatting with a heartbeat formatter, and adds a new Jobs setting in WkConf plus its value in application.conf.

Changes

Cohort / File(s) Change summary
Worker liveness persistence and reporting
app/models/job/Worker.scala
Replaces in-memory reportedAsDead Set with a TemporaryStore[ObjectId, Unit]. reportIfLivenessChanged now queries/updates the TemporaryStore and enforces a re-report interval from conf.Jobs.workerLivenessReReportInterval. Adds formatHeartbeat(worker) helper and updates dead/resurrected message formatting. WorkerLivenessService constructor now accepts reportedAsDeadTemporaryStore: TemporaryStore[ObjectId, Unit] and conf: WkConf.
Configuration: Jobs settings
app/utils/WkConf.scala
Adds Jobs.workerLivenessReReportInterval: FiniteDuration, read from jobs.workerLivenessReReportInterval.
Runtime config value
conf/application.conf
Sets silhouette.jobs.workerLivenessReReportInterval = 1 hour.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to WorkerLivenessService constructor changes and all call sites that instantiate it.
  • Verify TemporaryStore semantics (insert/contains/remove) and time-bound re-report logic.
  • Check message formatting changes and time/locale handling in formatHeartbeat.

Poem

I hop and I watch each tiny beat,
I store the pauses where paths may meet.
From burrowed cache to hourly chime,
I mark the lost, then mark the time.
A rabbit cheers: persistent and spry 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Improve 'Worker missing' notifications" is directly related to the main changes in the changeset. The raw summary shows that the PR modifies worker liveness tracking to persist dead-worker state via TemporaryStore, adds a formatHeartbeat helper to display duration since last heartbeat, and introduces a workerLivenessReReportInterval configuration for re-reporting. The PR description confirms these improvements: notifications now report the heartbeat duration and are re-sent after 1 hour. The title accurately captures the primary change at a high level and is concise and clear without unnecessary noise.
Description Check ✅ Passed The description is clearly related to the changeset and provides meaningful information about the changes. It explicitly lists the two main improvements: reporting the duration since last heartbeat and re-sending notifications after 1 hour, which directly correspond to the formatHeartbeat helper addition and the new workerLivenessReReportInterval configuration shown in the raw summary. The description also includes concrete testing steps and references to the completion of the development checklist, demonstrating that the author has validated the changes and prepared the PR appropriately.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch better-worker-missing-notifications

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cdc310 and 3c2f634.

📒 Files selected for processing (2)
  • app/utils/WkConf.scala (1 hunks)
  • conf/application.conf (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (1)
app/utils/WkConf.scala (1)
util/src/main/scala/com/scalableminds/util/tools/ConfigReader.scala (1)
  • get (23-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (1)
app/utils/WkConf.scala (1)

227-227: Configuration verified; code change is correct.

The configuration key workerLivenessReReportInterval is properly defined in conf/application.conf at line 324 with the value 1 hour, which aligns with the PR objectives. The code change follows the correct pattern and will function as intended at runtime.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fm3 fm3 marked this pull request as ready for review October 15, 2025 08:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccf1c28 and 3cae1eb.

📒 Files selected for processing (3)
  • app/models/job/Worker.scala (3 hunks)
  • app/utils/WkConf.scala (1 hunks)
  • conf/application.conf (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
app/utils/WkConf.scala (1)
util/src/main/scala/com/scalableminds/util/tools/ConfigReader.scala (1)
  • get (23-26)
app/models/job/Worker.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/TemporaryStore.scala (4)
  • TemporaryStore (9-83)
  • contains (18-23)
  • insert (55-60)
  • remove (69-74)
app/utils/WkConf.scala (1)
  • Jobs (224-228)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (3)
  • warn (18-21)
  • success (24-27)
  • info (21-24)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (3)
  • Instant (14-45)
  • Instant (47-103)
  • since (68-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: backend-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants